1.1 - What is RL for SC2

The documentation so far has focused on building scripted agents. In that paradigm, you, the developer, explicitly write the rules (if/then/else logic) that determine the bot's behavior. Reinforcement Learning (RL) introduces a fundamentally different approach: instead of writing the rules, you define a goal, and the agent learns its own rules through trial and error.

This section will guide you through building a learned agent, where the "brain" of the bot is a neural network model trained using libraries like stable-baselines3.

Two Paradigms- Scripted vs. Learned

Feature	Scripted Agent (Traditional Bot)	Learned Agent (RL Bot)
Developer's Role	Write explicit rules for every scenario.	Design the learning environment, observation, actions, and reward signal.
Decision Making	Based on a pre-defined logic tree.	Based on a learned policy that maps observations to actions.
Behavior	Predictable and consistent.	Can be creative and discover non-obvious strategies.
Primary Challenge	Anticipating all possible game states.	Designing a reward function that leads to desired behavior; long training times.

The Core RL Workflow

At its heart, RL is a feedback loop between an Agent and an Environment.

          +------------------+           +------------------+
          |                  |-----------|                  |
          |      Agent       |  Action   |   Environment    |
          | (Your RL Model)  |---------->|  (StarCraft II)  |
          |                  |           |                  |
          +-------^----------+           +--------v---------+
                  |                                |
                  |  Observation + Reward          |
                  +--------------------------------+

The Agent observes the state of the Environment.
Based on the observation, the Agent chooses an Action.
The Environment updates based on the Action and returns a new Observation and a Reward signal.
The Agent uses the Reward to update its internal strategy (its Policy) to take better actions in the future.

Key Terminology

Term	Definition in our SC2 Context
Agent	The `stable-baselines3` model we will train. This is the "brain."
Environment	The StarCraft II game, wrapped in a custom `gymnasium` interface.
Observation	A numerical representation of the game state (e.g., an array with minerals, supply, unit health).
Action	A discrete choice the Agent makes (e.g., `0` for "do nothing," `1` for "build worker").
Reward	A numerical score (`+1`, `-10`, etc.) we give the Agent to reinforce good behavior and punish bad behavior.
Policy	The internal strategy of the Agent; essentially, a function that decides which action to take for a given observation.

Our Goal for This Section

Our project is to build the bridge between python-sc2 and stable-baselines3 and train agents to complete specific tasks.

Task 1: Create a reusable gymnasium environment for StarCraft II.
Task 2: Train an agent to perform a simple economic task.
Task 3: Train an agent to make a simple macroeconomic trade-off.
Task 4: Train an agent to perform a simple combat micro task.

Why is RL Challenging in StarCraft II?

Applying RL to a complex game like StarCraft II is difficult for two main reasons:

Massive State Space: The number of possible game states is nearly infinite. We must carefully select a small subset of features for our agent to observe.
Credit Assignment Problem: A game can last for minutes. If you win, was it because of a good decision you made 30 seconds ago or 10 minutes ago? Designing reward functions that correctly assign credit is a major challenge.

In the following chapters, we will address these challenges by starting with very simple, focused tasks.

Two Paradigms- Scripted vs. Learned​

The Core RL Workflow​

Key Terminology​

Our Goal for This Section​

Why is RL Challenging in StarCraft II?​

Two Paradigms- Scripted vs. Learned

The Core RL Workflow

Key Terminology

Our Goal for This Section

Why is RL Challenging in StarCraft II?